Stochastic gradient descent convergence

After T=R2G2ϵ2T=\frac{R^2 G’^2}{\epsilon^2} iterations, 𝔼[f(𝐱̂)f(𝐱*)]ϵ\mathbb{E}[f(\hat{\mathbf{x}})-f(\mathbf{x}^*)] \leq \epsilon

Proof

Using Jensen’s inequality:

#incomplete


see: Stochastic gradient descent


References: 1.